21 research outputs found

    El aprendizaje semisupervisado como superación en precisión del aprendizaje supervisado en Desambiguación del Sentido de las Palabras

    Get PDF
    Se ha desarrollado un nuevo algoritmo de Desambiguación del Sentido de las Palabras (Word Sense Disambiguation) semisupervisado de autoarranque (bootstrapping) que alivia en gran medida el problema del Cuello de Botella de la Adquisición de Conocimiento (Knowledge Acquisition Bottleneck), que afecta de forma severa a los algoritmos supervisados actuales. Se demuestra que los algoritmos de Desambiguación del Sentido de las Palabras rinden una precisión mucho menor en corpus de texto general equilibrados (corpus reales) que en corpus de texto periodístico, debido a la naturaleza estereotipada y repetitiva de estos últimos. El algoritmo de autoarranque nuevo alcanza la precisión de los algoritmos supervisados en corpus reales, no periodísticos, y puede superarlos potencialmente debido al uso de una metodología de decisión binaria combinada con la propiedad un sentido por discurso (one-sense-per-discourse (OSPD) del lenguaje natural y a la mayor flexibilidad de los algoritmos de autoarranque (semisupervisados) que los supervisados, que les permite abordar mucho mejor que estos las fluctuaciones de dominio presentes en los corpus reales de texto general

    Discovering HIV related information by means of association rules and machine learning

    Get PDF
    Acquired immunodeficiency syndrome (AIDS) is still one of the main health problems worldwide. It is therefore essential to keep making progress in improving the prognosis and quality of life of affected patients. One way to advance along this pathway is to uncover connections between other disorders associated with HIV/AIDS-so that they can be anticipated and possibly mitigated. We propose to achieve this by using Association Rules (ARs). They allow us to represent the dependencies between a number of diseases and other specific diseases. However, classical techniques systematically generate every AR meeting some minimal conditions on data frequency, hence generating a vast amount of uninteresting ARs, which need to be filtered out. The lack of manually annotated ARs has favored unsupervised filtering, even though they produce limited results. In this paper, we propose a semi-supervised system, able to identify relevant ARs among HIV-related diseases with a minimal amount of annotated training data. Our system has been able to extract a good number of relationships between HIV-related diseases that have been previously detected in the literature but are scattered and are often little known. Furthermore, a number of plausible new relationships have shown up which deserve further investigation by qualified medical experts.This study has been partially supported by the Spanish Ministry of Science and Innovation within the DOTTHEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32, the OBSER-MENH Project (MCIN/AEI/10.13039/501100011033 and UE (“NextGenerationEU”/PRTR)) under Grant TED2021-130398B-C21 and the project RAICES (IMIENS 2022), PI18CIII/00004 “Infobanco para uso secundario de datos basado en estándares de tecnología y conocimiento: implementación y evaluación de un infobanco de salud para CoRIS (Info-bank for the secondary use of data based on technology and knowledge standards: implementation and evaluation of a health info-bank for CoRIS) - SmartPITeS” and PI18CIII/00019 - PI18/00890 - PI18/00981 “Arquitectura normalizada de datos clínicos para la generación de infobancos y su uso secundario en investigación: solución tecnológica (Clinical data normalized architecture for the generation of info-banks and their secondary use in research: technological solution) - CAMAMA 4” from Fondo de Investigación Sanitaria (FIS) Plan Nacional de I+D+i. The RIS cohort (CoRIS) is supported by the Instituto de Salud Carlos III through the Red Temática de Investigación Cooperativa en Sida (RD06/006, RD12/0017/0018 and RD16/0002/0006) as part of the Plan Nacional R+D+I and co-financed by ISCIII-Subdirección General de Evaluación and el Fondo Europeo de Desarrollo Regional (FEDER). The list of members of the Cohort of the Spanish HIV Research Network (CoRIS) is included in the Supplementary Material. Additional relationships between HIV-related diseases confirmed or discarded are included as Supplementary Material. This study would not have been possible without the collaboration of all patients, medical and nursing staff and data mangers who have taken part in the Project.S

    Normalized medical information visualization

    Get PDF
    A new mark-up programming language is introduced in order to facilitate and improve the visualization of ISO/EN 13606 dual model-based normalized medical information. This is the first time that visualization of normalized medical information is addressed and the programming language is intended to be used by medical non-IT professionals.S

    Service for the pseudonymization of electronic healthcare records based on ISO/EN 13606 for the secondary use of information

    Full text link
    The availability of electronic health data favors scientific advance through the creation of repositories for secondary use. Data anonymization is a mandatory step to comply with current legislation. A service for the pseudonymization of electronic healthcare record (EHR) extracts aimed at facilitating the exchange of clinical information for secondary use in compliance with legislation on data protection is presented. According to ISO/TS 25237, pseudonymization is a particular type of anonymization. This tool performs the anonymizations by maintaining three quasi-identifiers (gender, date of birth and place of residence) with a degree of specification selected by the user. The developed system is based on the ISO/EN 13606 norm using its characteristics specifically favorable for anonymization. The service is made up of two independent modules: the demographic server and the pseudonymizing module. The demographic server supports the permanent storage of the demographic entities and the management of the identifiers. The pseudonymizing module anonymizes the ISO/EN 13606 extracts. The pseudonymizing process consists of four phases: the storage of the demographic information included in the extract, the substitution of the identifiers, the elimination of the demographic information of the extract and the elimination of key data in free-text fields. The described pseudonymizing system was used in three Telemedicine research projects with satisfactory results. A problem was detected with the type of data in a demographic data field and a proposal for modification was prepared for the group in charge of the drawing up and revision of the ISO/EN 13606 norm

    Aplicación de técnicas de Ingeniería Lingüística en sistemas de e-learning basados en objetos de aprendizaje

    Get PDF
    II Simposio Pluridisciplinar sobre Diseño, Evaluación y Descripción de Contenidos Educativos Reutilizables (SPDECE), Barcelona, España., , 19/10/2005-21/10/2005, Barcelona, EspañaSe presentan tres posibles formas de aplicar técnicas derivadas del tratamiento de la información textual al ámbito de los sistemas de e-learning basados en objetos de aprendizaje (learning objects) reutilizables: la generación automática de metadatos (LOM. IMS-MD, SCORM, Dublín Core) a partir de recursos didácticos, la generación automática de cuestionarios de evaluación (IMS-QTI), y la construcción de buscadores lingüísticos de objetos didácticos en repositorios normalizados (IMS-DRI) y en repositorios semánticos basados en ontologías.Ministerio de Industri

    Machine learning-based model for prediction of clinical deterioration in hospitalized patients by COVID 19

    Get PDF
    [EN] Despite the publication of great number of tools to aid decisions in COVID-19 patients, there is a lack of good instruments to predict clinical deterioration. COVID19-Osakidetza is a prospective cohort study recruiting COVID-19 patients. We collected information from baseline to discharge on: sociodemographic characteristics, comorbidities and associated medications, vital signs, treatment received and lab test results. Outcome was need for intensive ventilatory support (with at least standard high-flow oxygen face mask with a reservoir bag for at least 6 h and need for more intensive therapy afterwards or Optiflow high-flow nasal cannula or noninvasive or invasive mechanical ventilation) and/or admission to a critical care unit and/or death during hospitalization. We developed a Catboost model summarizing the findings using Shapley Additive Explanations. Performance of the model was assessed using area under the receiver operating characteristic and prediction recall curves (AUROC and AUPRC respectively) and calibrated using the Hosmer-Lemeshow test. Overall, 1568 patients were included in the derivation cohort and 956 in the (external) validation cohort. The percentages of patients who reached the composite endpoint were 23.3% vs 20% respectively. The strongest predictors of clinical deterioration were arterial blood oxygen pressure, followed by age, levels of several markers of inflammation (procalcitonin, LDH, CRP) and alterations in blood count and coagulation. Some medications, namely, ATC AO2 (antiacids) and N05 (neuroleptics) were also among the group of main predictors, together with C03 (diuretics). In the validation set, the CatBoost AUROC was 0.79, AUPRC 0.21 and Hosmer-Lemeshow test statistic 0.36. We present a machine learning-based prediction model with excellent performance properties to implement in EHRs. Our main goal was to predict progression to a score of 5 or higher on the WHO Clinical Progression Scale before patients required mechanical ventilation. Future steps are to externally validate the model in other settings and in a cohort from a different period and to apply the algorithm in clinical practice. Registration: ClinicalTrials.gov Identifier: NCT04463706.This work was supported in part by grants from the Instituto de Salud Carlos III and the European Regional Development Fund COVID20/00459; the health outcomes group from Galdakao-Barrualde Health Organization; the Kronikgune Institute for Health Service Research; and the thematic network–REDISSEC (Red de Investigación en Servicios de Salud en Enfermedades Crónicas)–of the Instituto de Salud Carlos III. The funder of the study had no role in study design, data collection, analysis, management or interpretation, or writing of the report

    A study protocol for development and validation of a clinical prediction model for frailty (ModulEn): a new European commitment to tackling frailty

    Get PDF
    There is a growing need to implement and evaluate the technological solutions that allow the early detection of age-related frailty and enable assessment of the predictive values of frailty components. The broad use of these solutions may ensure an efficient and sustainable response of health and social care systems to the challenges related to demographic aging. In this paper, we present the protocol of the ModulEn study that aims to develop and validate a predictive model for frailty. For this purpose, the sample composed by older adults aged 65-80 years and recruited from the community will be invited to use an electronic device ACM Kronowise® 2.0. This device allows proactive and continuous monitoring of circadian health, physical activity, and sleep and eating habits. It will be used during a period of seven to ten days. The participants will also be given the questionnaires evaluating the variables of interest, including frailty level, as well as their experience and satisfaction with the device use. Data provided from these two sources will be combined and the relevant associations will be identified. In our view, the implications of this study' findings for clinical practice include the possibility to develop and validate tools for timely prevention of frailty progress. In the long term, the ModulEn may contribute to the critical reduction of frailty burden in Europe

    Aplicación de técnicas de Ingeniería Lingüística en sistemas de e-learning basados en objetos de aprendizaje

    No full text
    II Simposio Pluridisciplinar sobre Diseño, Evaluación y\ud Descripción de Contenidos Educativos Reutilizables (SPDECE), Barcelona,\ud España., , 19/10/2005-21/10/2005, Barcelona, EspañaSe presentan tres posibles formas de aplicar técnicas derivadas del\ud tratamiento de la información textual al ámbito de los sistemas de e-learning\ud basados en objetos de aprendizaje (learning objects) reutilizables: la generación\ud automática de metadatos (LOM. IMS-MD, SCORM, Dublín Core) a partir de\ud recursos didácticos, la generación automática de cuestionarios de evaluación\ud (IMS-QTI), y la construcción de buscadores lingüísticos de objetos didácticos\ud en repositorios normalizados (IMS-DRI) y en repositorios semánticos basados\ud en ontologías.Ministerio de Industri

    Supplementary files for Examining database persistence of ISO/EN 13606 standardized Electronic Health Record extracts: relational vs. noSQL approaches

    No full text
    3 code files written in sql, java and xquery performing six queries on mysql, mongodb and eXist databases respectively described in the paper entitled "Examining database persistence of ISO/EN 13606 standardized Electronic Health Record extracts: relational vs. noSQL approaches
    corecore